home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
CD ROM Paradise Collection 4
/
CD ROM Paradise Collection 4 1995 Nov.iso
/
science
/
fastmap.zip
/
FASTMAP.DOC
< prev
next >
Wrap
Text File
|
1993-04-01
|
42KB
|
839 lines
FASTMAP DOCUMENTATION
by David Curtis
FASTMAP implements an algorithm to provide an approximate
multipoint lod score for a disease against a number of markers
from supplied two point lod scores. At time of writing this
algorithm has been accepted for publication in Human Heredity
(Curtis D & Gurling HMD. A procedure for combining two-point lod
scores into a summary multipoint map. Human Heredity 1993; 43;
173-185). You should refer to this for an account of how FASTMAP
works and an evaluation of its performance with real and
simulated linkage data. The algorithm, program and source code
are made freely available, though the source code may not be
commercially exploited. However please cite this publication
when writing up any work for which you have found FASTMAP
useful.
FASTMAP takes as input two-point lod scores from a number of
markers and as output produces a table of estimated multipoint
lods scores, a graph file suitable for graphing these with the
Shareware program EASIGRAF (supplied with EASISTAT) and a
debugging file which contains additional information about the
approximations made. The approximation is produced very quickly,
at least in relation to the time taken to produce a full
multipoint. Overall the approximation is unbiased and is usually
quite accurate, although occasionally there can be be a fairly
large difference from the true multipoint lod scores as produced
for example by LINKMAP.
The version currently distributed may be regarded to some extent
as a prototype, although I think I have now got it working about
as well as I am going to. I would be extremely interested to
hear any comments concerning it performance. I also hope that
others better qualified than myself may be able to develop the
basic algorithm further and I would be glad to assist anyone in
explaining how the program is supposed to work. This
documentation contains some additional notes about the
implementation which were not included in the article submitted
for publication. Also included with these files are some more
detailed breakdowns of FASTMAP's performance in different
simulations, contained in the file FMAPEVAL.DOC.
COPYRIGHT
I hold the copyright to the source code. I hereby authorise
anyone to use, make adjustments to and redistribute this source
code provided only that they do not do so for profit and that my
original contribution is ackowledged, and that any alterations
from the original are clearly marked. Anyone who wishes to
distribute the code or programs compiled from it for profit may
only do so with prior agreement from me. However the algorithm
and ideas embodied in the source code may be freely used by
anybody for any purpose. Naturally I would hope that such a
person would acknowledge my contribution, and in particular I
would urge anyone who finds the procedure helpful to cite the
relevant reference. I would also be grateful if anyone who did
come up with any useful improvements might keep me informed of
them, although I would be very happy to see others take over
development of this idea.
PROGRAM INPUT
Input is either from the keyboard (standard input which may be
redirected) or from an input file specified on the command
line, e.g.:
fastmap
(then input is typed in interactively)
or:
fastmap < input.dat
or:
fastmap input.dat
When input is from standard input the program prompts the user
for the required values, but the format of the input is
identical regardless of whether it is from the keyboard or a
file.
Line 1:
One to three filenames. The first is for the tabulated output
file of lod score[s] at each map position. The second filename
if specified is a graph file for input into the EASIGRAF
program. The third filename if specified contains debugging
information which reports various aspects of the estimates
obtained by the program.
Line 2:
Values for the minimum and maximum distances (in centimorgans)
of the map over which lod scores are to be calculated. If in
the next line a number of fixed distances are given, then the
only effect of these two values is to define the horizontal
scaling of the graph.
Line 3:
Either: one number, which consists of the number of equidistant
points at which the lod score is to be evaluated between the
minimum and maximum distance given above. Or: several values
giving specific distances (in centimorgans) at which the lod
score is to be evaluated.
Line 4:
The number of pedigrees for which data will be input. If only
total lod scores are available then enter 1 here. However
FASTMAP should perform better if the individual lod scores are
available for each pedigree. You can get these from MLINK by
setting byfamily to true and then recompiling.
Line 5:
The name of the disease locus (up to 20 characters), followed
optionally by values for the "reliability" with which genotype
predicts phenotype. If no value for reliability is input then
the program will choose best-fitting values for each pedigree.
If one value is input then this value will be used for every
pedigree. Alternatively, a number of values equal to the number
of pedigrees may be input, in which case each pedigree can be
assigned a different value.
There then follows for each marker (which should be entered in
the order they appear on the map):
One line:
The name of the marker (up to 20 characters), followed by one
value giving the position of the marker on the map (in
centimorgans) followed by either one value giving the
probability that the marker will be informative for a given
meiosis, or alternatively a number of allele frequencies (which
should sum to 1) from which a conventional PIC value is
calculated by the program.
Second and subsequent lines (one for each pedigree):
A number of pairs of values for recombination fraction (in
ascending order) and observed two-point lod score. To indicate
that a marker was uninformative this line should consist of two
zeros (separated by a space). If the marker was not tested in a
particular pedigree this should be indicated by leaving the
line completely blank.
Input finishes when the end of file is reached, or when a blank
line is encountered instead of a line describing the next
marker. Information pertaining to each marker must be entered
in the order in which the markers appear along the map - the
markers must be in order of ascending distance.
PROGRAM CONSTANTS:
The following constants are defined in fastmap.h:
MAXPEDS - the maximum number of pedigrees to be used
MAXMARKERS - the maximum number of markers to be used
MAXPAIRS - the maximum number of pairs of values for
recombination fraction/lod score to be entered on
each line
MAXDISTS - the maximum number of specified distances at
which the lod score can be evaluated (this has no
effect on the number of equidistant points
between the minimum and maximum if that option is
chosen instead)
MINFRACTION - value specifying fraction of information from
a given marker which can be discarded, and
fractional overlap between markers which can be
ignored
If desired these constants can be altered and the program
recompiled.
NOTES ABOUT INPUT
1. Reliability values
The "reliability" value is the probability of observing the
"expected" phenotype for a given genotype in one offspring of
an informative phase known meiosis - the combined probability
of the offspring not being a nonpenetrant carrier nor a
phenocopy. It can take values between 0.5 and 1. In the context
of the complex pedigree from which the two-point lod scores are
obtained, it provides some measure of the extent to which the
disease genotype is known for each individual, given all the
phenotypic information in the pedigree. In a large complex
pedigree, this reliability value may be relatively high despite
penetrance values being low or phenocopy rates high. This is
because there can often be a fairly high degree of certainty of
an individual's genotype, for example because of the pattern of
illness in his children.
The effect of different reliability values is to alter the
sharpness of curvature of graph of expected lod score against
recombination fraction. High values produce more sharply peaked
curves which (if there are any apparent recombinants) go down
to minus infinity at zero recombination, lower values produce
flattened out curves.
If a reliability value is not specified for a pedigree, FASTMAP
will find the value which gives the best fit to the input lod
score values for all the markers. (Note that reliability values
can only be fitted if at least one marker contains more than
two pairs of recombination fraction/lod score values, otherwise
a reliability value of 1 will be chosen.) If you are dealing
with an incompletely penetrant disease or one with phenocopies
you should begin by letting FASTMAP generate fitted values for
the reliability. Such a fitted value is constrained to lie
between 0.51 and 0.99. If you are dealing with a fully-
penetrant trait then you may wish to specify a reliability of
1.
Fitting the reliability values takes a considerable amount of
time compared to the rest of the procedure. FASTMAP outputs the
values that it has chosen, and if you find that with different
markers the same pedigree always produces about the same
reliability value then you can save time by specifying this
value in the input file. If every pedigree has the same value
then you can just specify one value instead of one for each
pedigree. I find that with moderately complex pedigrees a value
of 0.99 is appropriate even when dealing with a disease with
fairly low penetrance.
2. PIC values, etc
Normally, for each marker FASTMAP calculates a conventional PIC
value from input allele frequencies. This is supposed to
provide a value for the proportion of meioses informative for
the disease locus which can be expected to be also informative
for the marker. However the user does have the option of
entering this probability directly, and there are probably two
circumstances when you may wish to do this.
The first case in which this is desirable is when the two-point
lod score has been derived from more than one allelic system.
If there are two polymorphic systems at the same locus, or very
close to each other, then it may be preferable to calculate
two-point lod scores with them jointly (e.g. with MLINK) rather
than to enter the results separately into FASTMAP. In this case
a joint PIC value should be calculated, for the probability
that at least one system will be informative at a given locus.
This is PIC=1-(1-PIC1)*(1-PIC2). (The PIC values can be
obtained by inspection of the debugging file after the
individual markers have been entered with their allele
frequencies.)
The second case when one might want to consider not using the
conventional PIC is to my mind much more dubious, and is when
dealing with a recessive disease. It is true that for certain types
of mating the PIC value does not give the true probability for
a meiosis to be informative. For example if two parents who are
carriers of a recessive disease have the same genotype and are
heterozygotes, and if the disease is known to be in phase with
the same marker allele in each parent, then if the child is
affected but is heterozygous for the marker we can conclude
that there has been one recombinant and one nonrecombinant
meiosis. However for a dominant disease we would not be able to
conclude anything from the situation of two such heterozygote
parents (one affected) producing a heterozygote child. There is
thus a case for using a slightly higher value than the
conventional PIC when dealing with recessive diseases. However
the difference from the conventional PIC is small. It is
maximal for a two-allele system with equal allele frequencies,
when I calculate that the proportion of matings between to
carriers producing affected offspring which are informative is
0.469, compared with a conventional PIC of 0.375. However when
dealing with a complex pedigree information will additionally
be obtained from other types of matings for which the ordinary
PIC is probably more appropriate. I would conclude that the
size of the effect is likely to be negligible in practice.
This view is to some extent supported by the simulations
carried out with a recessive disease, which used conventional
PIC values but demonstrated performance which was overall at
least as good as for a dominant disease. Nevertheless, the
option to enter values other than the PIC is available to the
user if desired.
3. Recombination fractions and lod scores
FASTMAP fits a number of recombinant and nonrecombinant meioses
to the observed two-point lod scores, and may fit a reliability
value as well. There are three distinct ways in which this
fitting is accomplished, depending on the number of pairs of
values which are entered for recombination fraction and lod
score.
If only one pair of values is entered then this is taken to be
for the recombination fraction at which the maximum lod score
is obtained. An exact number of recombinant and nonrecombinat
meioses which would produce this maximum lod can readily be
calculated, contingent on a reliability value. It is only
possible to use this form of input if there is an available lod
score at some recombination fraction which is positive. In
addition it is not possible to fit a reliability value which
depends on the curvature of the lod score graph.
If two pairs of values are entered then again it is possible to
find an exact solution which would produce a lod score curve
going through these two points. Again the solution is
contingent on the reliability value specified, which cannot be
fitted. This option can be used even when the lod scores are
all negative. However I would advise against only entering two
pairs of values. The reason is that the shape of the actual and
fitted curves may not be exactly the same, and it is easy to
imagine that producing a solution which passes exactly through
the two points specified may be wildly inaccurate at other
recombination fractions.
When more than two pairs of values are entered, numbers of
meioses are chosen to produce a line which most closely
approximates to the points specified. This closeness is in the
sense that the sum of squares distance between points on the
line and observed lod score values is minimised. In this
situation a reliability value can be fitted as well as the
number of recombinant and nonrecombinant meioses. Because of
the way the closeness of fit is measured, it is possible to
bias the fitting to give more priority to some recombination
fractions than others. For example if many pairs of values at
small recombination fractions were entered then more atttention
would be paid to getting the line to fit well at small
recombination fractions than large ones. Actually, since lod
scores at large recombination fractions are relatively small
anyway, it is the lod scores at smaller recombination fractions
which generally have more effect on the values eventually
arrived at. Lod scores at very small recombination fractions can
be very large indeed, so you are (strongly) advised to omit
these (e.g. at recombination fractions less than 0.01).
To summarise, my advice for the information to input would be a
series of lod score values at different recombination fractions
ranging from 0.01 to 0.4. FASTMAP was evaluated using lod scores
at 0.01, 0.05, 0.1, 0.2, 0.3 and 0.4 and this gave satisfactory
results. If three or more pairs of values are given for at
least one of the markers then this allows a reliability value
to be fitted to the shape of the curve. Avoid entering strongly
negative values at very low recombination fractions to avoid
distorting the fitted curve too wildly (the price of this is
that the estimate may be inaccurate very close to the marker
positions, but this is unavoidable).
EXAMPLE INPUT FILE:
UPM6DF.OUT UPM6DF.GRP UPM6DF.DBG
-20 60
100
3
UP
MS5H 0 .2 .2 .2 .2 .2
0.010 -0.9811 0.050 -0.5865 0.100 -0.3641 0.200 -0.1526 0.300 -0.0566 0.400 -0.0127
0.010 -2.8312 0.050 -1.9729 0.100 -1.4685 0.200 -0.8191 0.300 -0.4055 0.400 -0.1670
0.010 -2.4945 0.050 -1.7574 0.100 -1.1036 0.200 -0.3999 0.300 -0.1076 0.400 -0.0125
L6-3 21.2 .43 .57
0.010 -0.0902 0.050 -0.0747 0.100 -0.0579 0.200 -0.0316 0.300 -0.0138 0.400 -0.0034
0.010 -1.5290 0.050 -0.9314 0.100 -0.6415 0.200 -0.3675 0.300 -0.2186 0.400 -0.1029
0.010 0.0000 0.050 0.0000 0.100 0.0000 0.200 0.0000 0.300 0.0000 0.400 0.0000
HD2G 42.4 .24 .76
0.010 -0.0007 0.050 -0.0006 0.100 -0.0005 0.200 -0.0003 0.300 -0.0001 0.400 -0.0000
0.010 -2.3743 0.050 -1.4876 0.100 -0.9106 0.200 -0.3696 0.300 -0.1342 0.400 -0.0296
0.010 -0.8302 0.050 -0.7315 0.100 -0.4721 0.200 -0.0962 0.300 0.0329 0.400 0.0282
OUTPUT FILES
FASTMAP produces up to three output files with the names
specified on the first line of the input file.
1. Table output
The first file ouptut is a simple table of distance against lod
score - total lod score and a breakdown by pedigree. Because the
lod score may be evaluated at large number of positions (100 in
the example above) the pedigrees are arranged in columns, rather
than rows as might seem more natural.
2. Graph file output
The second file, if specified, is a graph file for input into
EASIGRAF, a Shareware graphing program supplied with the
EASISTAT package (obtainable from me or the same source as you
acquired FASTMAP). This displays a graph of lod score against
distance - again both the total lod score and for each pedigree.
A neat feature is that it also displays each marker on the same
graph. It is run by specifying the name of the graph file on the
command line, e.g.:
EASIGRAF filename.grp
Please consult the EASISTAT documentation for details on
how various aspects of the display may be altered. Essentially,
you can use the "Axes" menu to control aspects of the labelling
and scaling of the X and Y axes, and the "Data" menu to control
which columns are displayed from the graph file (the first
column corresponds to map distance, the second to total lod and
subsequent columns for each pedigree's lod score). If you wish
to only display the total lod score this can be done by pressing
D for the "Data" menu, then pressing 5 to select select XY
columns, then entering 1,2 to graph the second column against
the first. Then keep pressing Enter to return to the main menu.
There are a couple of points worth mentioning specifically. The
marker labels are implemented as "floating titles" for EASIGRAF,
which means they always appear in the same position on the
screen. This means that if you change the horizontal scale of
the graph the marker labels will no longer be in the correct
position (you can change the vertical scale with no problems).
When the graph file is first read in by EASIGRAF the horizontal
scale is determined by the minimum and maximum distances which
were entered to FASTMAP on line 2 of the input file. If the data
is regraphed (for instance if you use the "Data" menu to graph
just the total lod score against distance, columns 1 and 2 of
the graph file) then the graph will be rescaled. The new minimum
and maximum distances will then be determined by the smallest
and largest distances for which a lod score was calculated. If
you selected the option to calculate scores at equidistant
points between the minimum and maximum, then the scale of the
graph will be unchanged. However if lod scores were only
calculated for specific points then the smallest and largest of
these distances will determine the new scale and the floating
titles may appear in the wrong place. If you wish to change the
horizontal scaling of the graph, the best way to do it is to run
FASTMAP again with different minimum and maximum distances
specified, otherwise the floating titles for the markers will
appear in the wrong place.
Another point about the marker labels is that if the markers are
close together then the labels may overwrite each other. To fix
this just alter the vertical position of the relevant floating
title. Select "Edit titles" from the "Titles" menu, then select
"Edit TITLEF's". Go through pressing Enter till you get to the
desired label. Leave the text unchanged, but backspace and
change the Y value for the position (e.g. from 0.0 to 0.1) and
retype the rotation to 90. Then press Enter and Escape
appropriately to return to the main menu. The marker label will
be moved up a but, clear of the other labels. the
3. Debug file
The output from this is fairly complex, and should be studied in
conjunction with the source code and description of the
algorithm. A detailed description of its contents is given later
in the documentation.
USING FASTMAP IN PRACTICE
Supplied with these files is a utility program called TABLE
which produces the pairs of recombination fractions and lod
scores needed to input to FASTMAP. It is run on the output of
MLINK, although it does assume that the output from each two-
point analysis will be in a separate results file. To get these
pairs TABLE is run with the /I switch, e.g.:
TABLE filename.res /I
This would make a new file called filename.inp containing the
pairs of values at recombination fractions between 0.01 and 0.4.
Of course you would still have to input the additional
information about the number of pedigrees, etc. Still things can
be made even easier. The setup I have is to have different files
containing one line of information about each locus (its name,
position and allele frequencies) in one subdirectory. So there
might be a file called F13A.INP with the following contents:
F13A -50 .2 .2 .2 .2 .2
(You do have to be careful that the file has one and only one
line feed at the end of it, otherwise you would get extraneous
blank lines in your input file to FASTMAP.)
Then one can have a couple of simple batch files along the lines
of:
SETUPINP.BAT
echo %1.out %1.grp >%1.inp
echo %2 %3 >>%1.inp
echo %4 >>%1.inp
echo %5 >>%1.inp
echo %6 %7 >>%1.inp
and:
ADDINP.BAT
type d:\ls4\%2.inp >>%1.inp
table %3.res /i
type %3.inp >> %1.inp
These assume that the one line files for each locus are in the
directory D:\LS4.
Then a batch file which will take all the relevant two-point
results files, prepare them to make an input file for FASTMAP
and run FASTMAP could look like this:
DOFAST6.BAT
CALL SETUPINP EPHDALL -80 60 100 25 EPHD
CALL ADDINP EPHDALL F13A EPHDF13A
CALL ADDINP EPHDALL 6S89 EPHD6S89
CALL ADDINP EPHDALL 6109 EPHDF109
CALL ADDINP EPHDALL 6105 EPHDF105
CALL ADDINP EPHDALL 6S10 EPHD6S10
CALL ADDINP EPHDALL C4 EPHDC4
CALL ADDINP EPHDALL DQA EPHDDQA
CALL ADDINP EPHDALL TCTE EPHDTCTE
FASTMAP EPHDALL.INP
The call to SETUPINP.BAT produces the first few lines
EPHDALL.INP, with no "reliability" value specified. The
following lines, call ADDINP.BAT for each marker uses it to take
the one line locus description in D:\LS4\F13A.INP etc. and add
it to EPHDALL.INP, then run table on EPHDF13A.RES etc. and add
e.g. EPHDF13A.INP onto EPHDALL.INP. Finally FASTMAP is run with
EPHDALL.INP as input.
Of course you don't have to go to these lengths, but as you grow
more familiar with FASTMAP you might like to bear these examples
in mind.
PROBLEMS WITH FASTMAP
If FASTMAP seems to be producing poor approximations to be
performing poorly, there are a number of things you may want to
look at. Certainly it may be helpful to examine the debugging
file to see if any information gives a clue as to what may be
happening. You can check how good FASTMAP is at fitting to the
supplied two-point data by only inputting the data for one
marker at a time and checking to see how closely the output
corresponds to the input. If you have supplied a "reliability"
value then it would be worth removing this and letting FASTMAP
fit to the supplied lod score values with the reliability
uconstrained. Make sure that whenever possible you enter
information by pedigree, rather than as total lod scores summed
over all pedigrees. However there are some occasions when FASTMAP
will not produce a very good approximation, for example if there
just happens to be an unexpectedly large number of
recombinations between markers, or if two markers just happen to
be informative for all the same matings, and so on. I would be
interested to see examples of such bad performance, to see if
there are any further improvements which could be made.
DETAILED CONTENTS OF DEBUG FILE
It contains the following information:
For each marker, the proportion of meioses for which it is
expected to be informative. (This may either be input directly
by the user, or is the PIC value calculated from the allele
frequencies supplied instead.)
All the following information is repeated once for each
pedigree.
The reliability value is output, which may be supplied by the
user or fitted by the program.
For each marker the estimated equivalent number of recombinant
and nonrecombinant meioses that would produce lod scores close
to those observed is output.
The total estimated number of meioses informative for the
disease locus is output (based on the estimated number of
informative meioses for each marker and the probability of each
marker being informative).
For each marker, based on this total, the fraction of meioses
for which that marker is deemed to be actually informative.
The following information is repeated once for every interval on
the map. Information pertaining to each marker to the right of
the disease locus goes into one column, and each to the left in
a row. The information consists of the number of recombinant
meioses which are expected to be informative for a given marker,
and for no other marker between it and the disease locus.
The top row and leftmost column are for the meioses which are
only informative for a marker in the right group or in the left
group (but not both). In the top row the number of
nonrecombinants with the each right hand marker is printed just
above and to the left of the number of recombinants. In the left
most column the numbers of nonrecombinants with the each marker
is two lines above the number of recombinants.
The first set of values, which concerns the first interval, will
all be in one row. The first pair of numbers is the estimated
number of nonrecombinants and recombinants for the first marker.
The second pair relates to the second marker, but excludes those
meioses for which the first marker is expected to have already
been informative, and so on.
Reading down each column and along each row into the table one
can see the meioses which are expected to informative for a
marker in the lefthand group and in the righthand group
simultaneously. These meioses are categorised as to whether they
are nonrecombinant or recombinant for each marker. Here is an
example debug file containing information about 1 pedigree and 3
markers:
DQA.prob_inf=0.600000
C4.prob_inf=0.600000
6S10.prob_inf=0.600000
ped 1, "reliability" = 0.990:
DQA: 0.837 nonrec, 0.000 rec
C4: 0.837 nonrec, 0.000 rec
D6S10: 0.000 nonrec, 1.599 rec
Estimated total informative meioses for ped 1: 1.914425
DQA.fraction_used: 0.437224
C4.fraction_used: 0.437224
D6S10.fraction_used: 0.835266
0.837 0.471 0.000
0.000 0.000 0.597
0.471 0.000
0.000 1.006
0.423 0.364 0.000
0.000 0.049
0.000 0.000 0.000
0.000 0.000
0.000
1.006
0.294 0.000
0.543
0.000 0.000
0.000
0.421 0.000
0.048
0.000 0.000
0.000
0.000
1.599
0.294
0.000
0.000
0.000
The first three lines say that each marker had a probability of
0.6 of being informative (this information had been entered
directly). The "reliability" value was set to be 0.99. From the
observed lod scores, the estimated equivalent numbers of meioses
were 0.837 nonrecombinants with no recombinants for the first
two markers, and 1.599 recombinants with no nonrecombinant
meioses for the third. The estimated total number of potentially
informative meioses in the whole pedigree was 1.91, yielding the
stated values for the fractions for which each marker actually
was informative. (So the third marker, with a higher estimated
total number of meioses, turned out to be slightly more
informative than expected, while the first two were slightly
less.)
The first row shows the likely distribution of these meioses.
The first marker has 0.837 nonrecombinants. The second marker
has 0.471 remaining from its original 0.837 once we have
excluded the ones for which the first was informative. By the
time we get to the third marker there remain 0.597 of its
recombinant meioses for which neither of the first two were
informative. (We expect that some of the meioses which were
nonrecombinant at the position of the first and/or second
markers may have become recombinant by the time we get to the
third. Incidentally, although the distances are not shown in the
debugging file there is a recombination fraction of 0.01 between
the first two markers and 0.04 between the second and third.)
Now we move on to the next interval. Here we see that there are
0.364 for which the first two markers are both nonrecombinant.
The first marker is now in the leftmost group. There are another
0.049 meioses for which it is nonrecombinant and the third
marker is recombinant, and there are 0.423 meioses for which it
is nonrecombinant and no other marker is informative. The third
marker is also recombinant for 1.006 meioses which are not
informative for either of the first two.
In the next interval we again see the 1.006 recombinant meioses
for which only the third marker is informative. There are 0.543
meioses for which it is recombinant and the second marker is
nonrecombinant, and another 0.048 for which the first marker is
nonrecombinant. There are 0.294 meioses for which the second
marker is nonrecombinant and the third uninformative. There are
0.421 meioses for which the first marker is nonrecombinant and both
the others noninformative.
In the final interval all markers are now in the lefthand group.
We begin with the third marker which has 1.599 recombinant
meioses. Excluding these, there remain 0.294 meioses which are
nonrecombinant for the second marker. On this occasion we
estimate that there are no meioses which are informative for the
first marker and neither of the others.
NOTES ABOUT IMPLEMENTATION
FASTMAP.EXE is a DOS executable which should run on any IBM PC
compatible running MSDOS. If a maths coprocessor is present it
will speed up calculations, but it is not required. I have been
running it on a 486 which gives good performance - estimated
multipoints using 25 pedigrees and 8 markers with reliability
values to be fitted by the program were produced in 70 seconds.
The Sun SPARCServer I have access to produced the same results
in 13 seconds.
The file FASTMAP.C is supplied and should compile OK on most
compilers with little if any modification. I have compiled it
with the Zortech DOS compiler and on a Sun. If you compile it on
a DOS machine you may want to ensure that a large stack is
provided, and you should use a large memory model so there is
room for the data tables.
FASTMAP.H begins with a few #defines to control compilation. You
may want to modify these for your own compiler. The issues are
whether the compiler can accept ANSI C/C++ style prototypes,
whether it can use enums (this is pretty unimportant), and where
to find a prototype for calloc (mine is in stdlib.h). There may
be also be compiler specific ways to modify the stack size, and
with the Zortech compiler this is accomplished with the
_stack=30000 statement. Some libraries contain the function
index() instead of strchr(). Both do the same thing, so you may
need to use the "#define strchr index" statement.
As well as declaring functions and variables, the header file
defines a few program constants (listed above) which can be
changed if desired.
A general point about coding style is that I have tended to keep
a fair amount of information in structures, which are passed to
functions either by value or reference. This largely reflects my
exposure to C++ and an attempt to make the code somewhat object-
orientated. This and other factors may mean that the code is not
as efficient as it could be, but on the other hand it should
make it easier to modify if improvements can be found for the
basic algorithm. Another slight inefficiency may be the liberal
use of doubles rather than floats. A major reason for this is
that I have used the old-fashioned argument-passing style so
that the code will be compatible with K & R compilers. However
ANSI compilers will then report errors if arguments are declared
as floats (what happens is that the all float arguments are
passed as doubles and that ANSI compilers will not make the
automatic cast back to float when this style of argument-passing
is used). Since using doubles does not actually incur a
prohibitive overhead, I have tended to use them throughout to
avoid having to worry about this problem.
I have now commented the code fairly comprehensively, and I hope
that in conjunction with the paper it should be possible to work
out what is going on.
AVAILABILITY
FASTMAP is available directly from me on receipt of a formatted
floppy disk. However I would prefer people to obtain it from one
of the software libraries on Internet listed below. The EASISTAT
package is available from the same sources, but requires another
720 K of disk space, so if you wish to obtain it from me then
please enclose the appropriate number of extra formatted disks.
gene-server:
Internet gene-server@bchs.uh.edu
BITNET/
EARN gene-server%bchs.uh.edu@CUNYVM
UUCP gene-server@bchs.UUCP (new style)
Send mail with Subject: SEND DOS HELP
Anonymous ftp: ftp.bchs.uh.edu
The following are mirror sites for the above collection.
European:
Anonymous FTP: nic.funet.fi
E-mail: mailserver@nic.funet.fi
Send mail message: HELP
European EMBL server:
NetServ@EMBL-Heidelberg.DE
Send mail message: DIR DOS_SOFTWARE
Anonymous ftp: ftp.embl-heidelberg.de (/pub/software/dos)
Manager: Rainer Fuchs, Fuchs@EMBL-Heidelberg.DE
Problems: NetHelp@EMBL-Heidelberg.DE
USA anonymous FTP: ftp.bio.indiana.edu
Please feel very free to contact me (email preferred) with
comments, questions, etc. I would be very interested in people's
views on how well it performs and how useful (or not) it is.
Dave Curtis
Academic Department of Psychiatry
St Mary's Hospital Medical School
Praed Street
London W2 1NY, England Phone: 071 725 1638
Janet: dcurtis@UK.AC.CRC
Elsewhere: dcurtis@CRC.AC.UK
EARN/Bitnet: dcurtis%CRC@UKACRL
Usenet: ...!mcsun!ukc!mrccrc!D.Curtis